---
id: langchain
title: LangChain
sidebar_label: LangChain
---

import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";
import { Timeline, TimelineItem } from "@site/src/components/Timeline";
import VideoDisplayer from "@site/src/components/VideoDisplayer";

# LangChain

[LangChain](https://www.langchain.com/) is an open-source framework for developing applications powered by large language models, enabling chaining of LLMs with external data sources and expressive workflows to build advanced generative AI solutions.

:::tip
We recommend logging in to [Confident AI](https://app.confident-ai.com) to view your LangChain evaluation traces.

```bash
deepeval login
```

:::

## End-to-End Evals

`deepeval` allows you to evaluate LangChain applications end-to-end in **under a minute**.

<Timeline>

<TimelineItem title="Configure LangChain">

Create a `CallbackHandler` with a list of [task completion metrics](/docs/metrics-task-completion) you wish to use, and pass it to your LangChain application's `invoke` method.

```python title="main.py" showLineNumbers
from langchain.agents import create_tool_calling_agent, AgentExecutor
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.tools import tool
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

from deepeval.integrations.langchain import CallbackHandler

from deepeval.metrics import TaskCompletionMetric

@tool
def multiply(a: int, b: int) -> int:
    """Returns the product of two numbers"""
    return a * b

llm = ChatOpenAI(model="gpt-4o-mini")

agent_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant that can perform mathematical operations."),
    ("human", "{input}"),
    MessagesPlaceholder("agent_scratchpad"),
])

agent = create_tool_calling_agent(llm, [multiply], agent_prompt)
agent_executor = AgentExecutor(agent=agent, tools=[multiply], verbose=True)

# result = agent_executor.invoke(
#    {"input": "What is 8 multiplied by 6?"},
#    config={"callbacks": [CallbackHandler(metrics=[TaskCompletionMetric()])]}
#)

#print(result)
```

:::info
Only [Task Completion](/docs/metrics-task-completion) is supported for the LangChain integration. To use other metrics, manually [set up tracing](/docs/evaluation-llm-tracing) instead.
:::

</TimelineItem>
<TimelineItem title="Run evaluations">

Create an `EvaluationDataset` and invoke your LangChain application for each golden within the `evals_iterator()` loop to run end-to-end evaluations.

<Tabs groupId="langgraph">
<TabItem value="synchronous" label="Synchronous">

```python title="main.py" showLineNumbers
from deepeval.dataset import EvaluationDataset, Golden

dataset = EvaluationDataset(goldens=[
    Golden(input="What is 3 * 12?"),
    Golden(input="What is 8 * 6?")
])

for golden in dataset.evals_iterator():
    agent_executor.invoke(
        {"input": golden.input},
        config={"callbacks": [CallbackHandler(metrics=[TaskCompletionMetric()])]}
    )
```

</TabItem>
  <TabItem value="asynchronous" label="Asynchronous">

```python title="main.py" showLineNumbers
import asyncio

dataset = EvaluationDataset(goldens=[
    Golden(input="What is 3 * 12?"),
    Golden(input="What is 8 * 6?")
])

for golden in dataset.evals_iterator():
    task = asyncio.create_task(
        agent_executor.ainvoke(
            {"input": golden.input},
            config={"callbacks": [CallbackHandler(metrics=[TaskCompletionMetric()])]}
        )
    )
    dataset.evaluate(task)
```

</TabItem>
</Tabs>

✅ Done. The `evals_iterator` will automatically generate a test run with individual evaluation traces for each golden.

</TimelineItem>
<TimelineItem title="View on Confident AI (optional)">

<VideoDisplayer
  src="https://confident-bucket.s3.us-east-1.amazonaws.com/end-to-end%3Alangchain.mp4"
/>

</TimelineItem>

</Timeline>

:::note
If you need to evaluate individual components of your LangChain application, [set up tracing](/docs/evaluation-llm-tracing) instead.
:::

## Component-level Evals

Using `deepeval`, you can now evaluate individual components of your LangChain application.

### LLM
Define `metrics` in the metadata of the all the `BaseLanguageModel`s in your LangChain application.

```python title="main.py" showLineNumbers
from langchain_openai import ChatOpenAI
from deepeval.metrics import AnswerRelevancyMetric
...

llm = ChatOpenAI(
    model="gpt-4o-mini", 
    metadata={"metric": [AnswerRelevancyMetric()]}
).bind_tools([get_weather])
```

### Tool
To pass `metrics` to the tools, you can use the DeepEval's LangChain `tool` decorator.
```python title="main.py" showLineNumbers
# from langchain_core.tools import tool
from deepeval.integrations.langchain import tool
from deepeval.metrics import AnswerRelevancyMetric
...

@tool(metric=[AnswerRelevancyMetric()])
def get_weather(location: str) -> str:
    """Get the current weather in a location."""
    return f"It's always sunny in {location}!"
```


## Evals in Production

To run online evaluations in production, simply replace `metrics` in `CallbackHandler` with a [metric collection](https://www.confident-ai.com/docs/metrics/metric-collections) string from Confident AI, and push your LangChain agent to production.

:::info
This will automatically evaluate all incoming traces in production with the task completion metrics defined in your [metric collection](https://www.confident-ai.com/docs/metrics/metric-collections).
:::

```python filename="main.py" showLineNumbers
result = agent_executor.invoke(
    {"input": "What is 8 multiplied by 6?"},
    config={"callbacks": [CallbackHandler(metric_collection="<metric-collection-name-with-task-completion>")]}
)
```
